Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 1 de 1
Filter
Add filters

Database
Language
Document Type
Year range
1.
IEEE Transactions on Big Data ; : 1-15, 2022.
Article in English | Scopus | ID: covidwho-2052080

ABSTRACT

Tracking the evolution of clusters in social media streams is becoming increasingly important for many applications, such as early detection and monitoring of natural disasters or pandemics. In contrast to clustering on a static set of data, streaming data clustering does not have a global view of the complete data. The local (or partial) view in a high-speed stream makes clustering a challenging task. In this paper, we propose a novel density peak based algorithm, <monospace>TStream</monospace>, for tracking the evolution of clusters and outliers in social media streams, via the evolutionary actions of cluster adjustment, emergence, disappearance, split, and merge. <monospace>TStream</monospace> is based on a temporal decay model and text stream summarisation. The decay model captures the decreasing importance of textual documents over time. The stream summarisation compactly represents them with the help of cells (aka micro-clusters) in the memory. We also propose a novel efficient index called shared dependency tree (aka SD-Tree) based on the ideas of density peak and shared dependency. It maintains the dynamic dependency relationships in <monospace>TStream</monospace> and thereby improves the overall efficiency. We conduct extensive experiments on five real datasets. <monospace>TStream</monospace> outperforms the existing state-of-the-art solutions based on <monospace>MStream</monospace>, <monospace>MStreamF</monospace>, <monospace>EDMStream</monospace>, <monospace>OSGM</monospace>, and <monospace>EStream</monospace>, in terms of cluster mapping measure (CMM) by up to 17.8%, 18.6%, 6.9%, 16.4%, and 20.1%, respectively. It is also significantly more efficient than <monospace>MStream</monospace>, <monospace>MStreamF</monospace>, <monospace>OSGM</monospace>, and <monospace>EStream</monospace>, in terms of response time and throughput. IEEE

SELECTION OF CITATIONS
SEARCH DETAIL